有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java正则表达式查找以开头和结尾的所有可能出现的文本~

我想找出两个~之间所有可能出现的文本

例如:对于文本~*_abc~xyz~ ~123~,我希望以下表达式作为匹配模式:

  1. ~*_abc~
  2. ~xyz~
  3. ~123~

注意,它可以是字母或数字

我尝试使用regex ~[\w]+?~,但它没有给我~xyz~。我想重新考虑一下。但我不想把~~作为可能的匹配


共 (2) 个答案

  1. # 1 楼答案

    capturing inside a positive lookahead与以下正则表达式一起使用:

    Sometimes, you need several matches within the same word. For instance, suppose that from a string such as ABCD you want to extract ABCD, BCD, CD and D. You can do it with this single regex:

    (?=(\w+))

    At the first position in the string (before the A), the engine starts the first match attempt. The lookahead asserts that what immediately follows the current position is one or more word characters, and captures these characters to Group 1. The lookahead succeeds, and so does the match attempt. Since the pattern didn't match any actual characters (the lookahead only looks), the engine returns a zero-width match (the empty string). It also returns what was captured by Group 1: ABCD

    The engine then moves to the next position in the string and starts the next match attempt. Again, the lookahead asserts that what immediately follows that position is word characters, and captures these characters to Group 1. The match succeeds, and Group 1 contains BCD.

    The engine moves to the next position in the string, and the process repeats itself for CD then D.

    所以,使用

    (?=(~[^\s~]+~))
    

    regex demo

    模式(?=(~[^\s~]+~))检查字符串中的每个位置,并搜索~后跟除空格和~以外的1+字符,然后再后跟另一个~。由于索引仅在检查位置后移动,而不是在捕获值时移动,因此会提取重叠的子字符串

    Java demo

    String text = " ~*_abc~xyz~ ~123~";
    Pattern p = Pattern.compile("(?=(~[^\\s~]+~))");
    Matcher m = p.matcher(text);
    List<String> res = new ArrayList<>();
    while(m.find()) {
        res.add(m.group(1));
    }
    System.out.println(res); // => [~*_abc~, ~xyz~, ~123~]
    

    以防有人需要Python demo

    import re
    p = re.compile(r'(?=(~[^\s~]+~))')
    test_str = " ~*_abc~xyz~ ~123~"
    print(p.findall(test_str))
    # => ['~*_abc~', '~xyz~', '~123~']
    
  2. # 2 楼答案

    试试这个[^~\s]*

    此模式不考虑字符^ {< CD2>}和空间(参照A^ ^ {CD3>})。

    我已经测试过了,它对你的字符串有效,这里是the demo